-
-
Notifications
You must be signed in to change notification settings - Fork 7k
Strings without null-termination #1936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I just added one more commit: "Add String::concat(const uint8_t *, unsigned int) version". |
If a concat(pointer, len) is going to be added, perhaps "const void *" would be better? |
@PaulStoffregen Hmm, that might make things easier. On the other hand, having a void* version might complicate overloading, or make it easier for people to accidentally pass a different kind of pointer without the compiler warning them about that. These might not be important for the Does this make sense to you, or do you see things differntly? |
@PaulStoffregen, would be great if you could respond to my last comment here. |
@cmaglie, any chance you get merge this for the next release? I just ran into trouble again by not having a |
7d91ced
to
c5bd195
Compare
I've just rebased this PR, so it is again current. I also realized this changed the AVR WString code, but not SAM, so I updated each commit to apply the same changes to both. |
@ArduinoBot build this please |
I tested this PR against all examples in the repository, which all still work as before (78% even compiles to the same binary). Some sketches become bigger, some sketches became smaller (on average all sketches became 3 bytes bigger. Limiting to only the sketches that actually changed, so the sketches using String in some way, these are 14 bytes bigger on average). All size increases seem to be caused because |
The features here would combine well with my PR here: #3096 In the future, rather than relying on the len input (diff), This would resolve problems if someone was to copy or concatenate a buffer which has trailing nulls.
Whether it be a logical error (buffer isn't as full as expected), or intention (nulls mean something else),
|
One of the goals of this PR is to allow a String to store embedded NUL characters as well (e.g. to store a binary packet in it). IIUC, your suggestion would exactly break that possibility. Having a |
Instead of calling strlen in a dozen places, instead just call String::concat(s), which will then call strlen. This shrinks the code size of these calls significantly, the StringAppendOperator example on the Arduino Uno shrinks by 72 bytes. This change does incur a slight runtime cost, because there is one extra function call.
Previously, these methods took a nul-terminated string and appended it to the current buffer. The length argument (or length of the passed String object) was used to allocate memory, but was not used when actually copying the string. This meant that: - If the length was not correct, or the string passed in was not nul-terminated, the buffer would overflow. - If the string passed in contained embedded nul characters (e.g, in among the first length characters), any characters after the embedded nul would not be copied (but len would be updated to indicate they were). In practice, neither of the above would occur, since the length passed is always the known length of the string, usually as returned by strlen. However, to allow using this concat method to pass in strings that are not nul-terminated, its behaviour should change. This commit changes the method to use memcpy instead of strcpy, copying exactly the number of bytes passed in. For the current calls to this method, which pass a nul-terminated string, without embedded nul characters and a correct length, this behaviour should not change. However, this concat method can now also be used in the two cases mentioned above. Non-nul-terminated strings now work as expected and for strings with embedded newlines the entire string is copied as-is, instead of leaving uninitialized memory after the embedded nul byte. Note that a lot of operations will still only see the string up to the embedded nul byte, but that's not really fixable unless we reimplement functions like strcmp.
Before, it used strncpy, but that has undefined behaviour when the source and destination strings overlap. memove is guaranteed to work correctly in this case. Note that memmove simply copies all bytes, whereas strncpy stopped at the first nul-byte. This means this commit also makes remove work for strings that contain embedded nul bytes.
Now concat(const char*, unsigned int) no longer requires a nul-terminated string, we can simplify the concat(char) method to just pass the address of the single character instead of having to create buffer with nul-termination.
This method is useful when receiving data from external sources that pass an explicit length instead of a NUL-terminated string.
This constructor allows converting a buffer containing a non-nul-terminated string to a String object, by explicitely passing the length.
Before, substring would (temporarily) add a \0 in the original string at the end of the substring, so the substring could be copied into a new String object. However, now that the String::copy() method can work with non-nul-terminated strings (by passing an explicit length), this trick is not needed and we can just call the copy method instead.
This just calls the char* version, but allows calling the method with a uint8_t* as well (which is not uncommon for buffers).
This allows creating a String from a uint8_t[] or uint8_t* as well, without having to add explicit casts.
f821d03
to
deadaa2
Compare
Rebased this PR again, and added one more commit to add a |
I agree, it should be an optional. With regards to this PR, as it doesn't seem to change any existing functionality, and only add features, this can have a benefit (saw the issue you referenced above).
This seems to be a different problem to the issue. By combine, do you mean something like this: String s = "Hello";
s += Obj.func(); I'm not sure I understand what difficulties you are having. |
For example, consider a library that calls a callback function when data is received, passing in a (non-nul-terminated)
Which should just copy the data into the String's buffer, without needing another intermediate copy. For another (pretty much the same) example, see #6546. |
@matthijskooijman do you think it still makes sense to port this PR to API repo? |
I think these changes are still useful. They probably need a bit of rebasing and cleaning up. I'm willing to have a look at resubmitting this, if you (or someone else) will review and merge it? |
I can't assure anything about merging but for sure we'll review them (I think it's a good patch so the merge chances are high). |
Yeah, that's what I meant. Thanks! Pullrequest created at arduino/ArduinoCore-API#97, so I'm closing this one. |
When working with the Arduino String class, I've found that I couldn't efficiently combine it with some external libraries that explicitely pass char* and length around, without nul-terminating their strings. This prompted me to modify and expose the concat (const char* cstr, unsigned int length) method, add a new String(const char* cstr, unsigned int length) constructor. While I was going over the string class, I found some other minor improvements, which are included here. This is a port of arduino/Arduino#1936. The commits are identical, except for some improved commit messages and one commit was dropped since that change was already made by someone else in the meantime. I've provided some testcases by updating the string examples: arduino/Arduino#9239 If this is ok to merge, I'll also provide a PR for the reference documentation. cherry-pick from: arduino/ArduinoCore-API#97
When working with the Arduino String class, I've found that I couldn't efficiently combine it with some external libraries that explicitely pass char* and length around, without nul-terminating their strings. This prompted me to modify and expose the concat (const char* cstr, unsigned int length) method, add a new String(const char* cstr, unsigned int length) constructor. While I was going over the string class, I found some other minor improvements, which are included here. This is a port of arduino/Arduino#1936. The commits are identical, except for some improved commit messages and one commit was dropped since that change was already made by someone else in the meantime. I've provided some testcases by updating the string examples: arduino/Arduino#9239 If this is ok to merge, I'll also provide a PR for the reference documentation. cherry-pick from: arduino/ArduinoCore-API#97
When working with the Arduino String class, I've found that I couldn't efficiently combine it with some external libraries that explicitely pass
char*
and length around, without nul-terminating their strings. This prompted me to modify and expose theconcat (const char* cstr, unsigned int length)
method, add a newString(const char* cstr, unsigned int length)
constructor. While I was going over the string class, I found some other minor improvements, which are included here.